In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.
translated by 谷歌翻译
考虑互动学习的问题设定(IGL),其中学习者的目标是与环境进行最佳互动,而无需明确的奖励以依靠其政策。代理商观察上下文向量,采取行动并接收反馈向量,并使用此信息有效地优化潜在奖励功能的策略。当反馈向量包含该动作时,事先分析的方法失败了,这在许多潜在方案中显着限制了IGL的成功,例如脑部计算机界面(BCI)或人类计算机界面(HCI)应用程序。我们通过创建算法和分析来解决这一问题,该算法和分析即使反馈向量包含以任何方式编码的动作,允许IGL起作用。我们根据监督数据集提供理论保证和大规模实验,以证明新方法的有效性。
translated by 谷歌翻译
我们为模仿学习提供了一个新的框架 - 将模仿视为政策和奖励之间的基于两人排名的游戏。在这个游戏中,奖励代理商学会了满足行为之间的成对性能排名,而政策代理人则学会最大程度地提高这种奖励。在模仿学习中,很难获得近乎最佳的专家数据,即使在无限数据的限制下,也不能像偏好一样对轨迹进行总订购。另一方面,仅从偏好中学习就具有挑战性,因为需要大量偏好来推断高维奖励功能,尽管偏好数据通常比专家演示更容易收集。经典的逆增强学习(IRL)的配方从专家演示中学习,但没有提供从离线偏好中纳入学习的机制,反之亦然。我们将提出的排名游戏框架实例化,并具有新颖的排名损失,从而使算法可以同时从专家演示和偏好中学习,从而获得两种方式的优势。我们的实验表明,所提出的方法可实现最新的样本效率,并可以从观察(LFO)设置中学习以前无法解决的任务。
translated by 谷歌翻译
已知人类凝视是在操纵任务期间的潜在人类意图和目标的强大指标。这项工作研究人类教师的凝视模式证明了机器人的任务,并提出了这种模式可用于增强机器人学习的方式。使用Kinesthetic教学和视频演示,我们在教学中识别新颖的意图揭示凝视行为。这些在各种问题中被证明是从参考帧推理到多步任务的分割的各种问题。基于我们的研究结果,我们提出了两个概念验证算法,该算法表明,凝视数据可以增强多台任务的子任务分类,高达6%,奖励推理和策略学习,可为单步任务高达67%。我们的调查结果为机器人学习中的自然人凝视模型提供了基础,从演示设置上学习,并在利用人凝游来提高机器人学习的开放问题。
translated by 谷歌翻译
We introduce MuJoCo MPC (MJPC), an open-source, interactive application and software framework for real-time predictive control, based on MuJoCo physics. MJPC allows the user to easily author and solve complex robotics tasks, and currently supports three shooting-based planners: derivative-based iLQG and Gradient Descent, and a simple derivative-free method we call Predictive Sampling. Predictive Sampling was designed as an elementary baseline, mostly for its pedagogical value, but turned out to be surprisingly competitive with the more established algorithms. This work does not present algorithmic advances, and instead, prioritises performant algorithms, simple code, and accessibility of model-based methods via intuitive and interactive software. MJPC is available at: github.com/deepmind/mujoco_mpc, a video summary can be viewed at: dpmd.ai/mjpc.
translated by 谷歌翻译
Climate change is expected to aggravate wildfire activity through the exacerbation of fire weather. Improving our capabilities to anticipate wildfires on a global scale is of uttermost importance for mitigating their negative effects. In this work, we create a global fire dataset and demonstrate a prototype for predicting the presence of global burned areas on a sub-seasonal scale with the use of segmentation deep learning models. Particularly, we present an open-access global analysis-ready datacube, which contains a variety of variables related to the seasonal and sub-seasonal fire drivers (climate, vegetation, oceanic indices, human-related variables), as well as the historical burned areas and wildfire emissions for 2001-2021. We train a deep learning model, which treats global wildfire forecasting as an image segmentation task and skillfully predicts the presence of burned areas 8, 16, 32 and 64 days ahead of time. Our work motivates the use of deep learning for global burned area forecasting and paves the way towards improved anticipation of global wildfire patterns.
translated by 谷歌翻译
通过使用机器学习技术的异常检测已成为一种新型强大的工具,可以在标准模型之外寻找新物理学。从历史上看,与JET可观察物的发展相似,理论一致性并不总是在算法和神经网络体系结构的快速发展中扮演核心角色。在这项工作中,我们通过使用能量加权消息传递来构建基于图神经网络的红外和共线安全自动编码器。我们证明,尽管这种方法具有理论上有利的特性,但它也对非QCD结构表现出强大的敏感性。
translated by 谷歌翻译
Covid-19大流行是人类的祸害,宣称全世界超过500万人的生活。虽然疫苗正在全世界分布,但表观需要实惠的筛选技术,以便为无法获得传统医学的世界服务。人工智能可以提供利用咳嗽声音作为主要筛选模式的解决方案。本文介绍了多种模型,这些模型在学术文献目前呈现的最大评估数据集上取得了相对尊敬的性能。此外,我们还显示性能随着培训数据规模而增加,表明世界各地的数据收集,以帮助使用非传统方式对抗Covid-19大流行。
translated by 谷歌翻译
联合学习偏离“将数据发送到模型”的规范“向数据发送模型”。当在边缘生态系统中使用时,许多异构边缘设备通过不同的方式收集数据并通过不同的网络信道连接参与培训过程。由于设备故障或网络问题,这种生态系统中的边缘设备的失败很可能。在本文中,我们首先分析边缘设备数量对FL模型的影响,并提供一种选择有助于该模型的最佳设备的策略。我们观察所选设备失败并提供缓解策略以确保强大的联合学习技术的影响。
translated by 谷歌翻译